The Forest or the Trees? Tackling Simpson's Paradox with Classi fication and Regression Trees

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Forest or the Trees? Tackling Simpson's Paradox in Big Data Using Trees

Prediction and variable selection are major uses of data mining algorithms but they are rarely the focus in causal IS research. Because experiments are often impossible, unethical or expensive to perform, causal IS research often relies on observational data. A major challenge is to infer causality from such data. Simpson’s paradox can arise in such contexts, causing uncertainty regarding the r...

متن کامل

Tackling Simpson's Paradox in Big Data using Classification & Regression Trees

This work is aimed at finding potential Simpson’s paradoxes in Big Data. Simpson’s paradox (SP) arises when choosing the level of data aggregation for causal inference. It describes the phenomenon where the direction of a cause on an effect is reversed when examining the aggregate vs. disaggregates of a sample or population. The practical decision making dilemma that SP raises is which level of...

متن کامل

CORT: classification or regression trees

In this paper we challenge three of the underlying principles of CART, a well know approach to the construction of classification and regression trees. Our primary concern is with the penalization strategy employed to prune back an initial, overgrown tree. We reason, based on both intuitive and theoretical arguments, that the pruning rule for classification should be different from that used fo...

متن کامل

Model-Based Classi cation Trees

The construction of classiication trees is nearly always top-down, locally optimal and data-driven. Such recursive designs are often globally ineecient, for instance in terms of the mean depth necessary to reach a given classiication rate. We consider statistical models for which exact global optimization is feasible, and thereby demonstrate that recursive and global procedures may result in ve...

متن کامل

Outlier Detection by Boosting Regression Trees

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SSRN Electronic Journal

سال: 2014

ISSN: 1556-5068

DOI: 10.2139/ssrn.2392953